Endogenous viral elements reveal associations between a non-retroviral RNA virus and symbiotic dinoflagellate genomes

Endogenous viral elements (EVEs) offer insight into the evolutionary histories and hosts of contemporary viruses. This study leveraged DNA metagenomics and genomics to detect and infer the host of a non-retroviral dinoflagellate-infecting +ssRNA virus (dinoRNAV) common in coral reefs. As part of the Tara Pacific Expedition, this study surveyed 269 newly sequenced cnidarians and their resident symbiotic dinoflagellates (Symbiodiniaceae), associated metabarcodes, and publicly available metagenomes, revealing 178 dinoRNAV EVEs, predominantly among hydrocoral-dinoflagellate metagenomes. Putative associations between Symbiodiniaceae and dinoRNAV EVEs were corroborated by the characterization of dinoRNAV-like sequences in 17 of 18 scaffold-scale and one chromosome-scale dinoflagellate genome assembly, flanked by characteristically cellular sequences and in proximity to retroelements, suggesting potential mechanisms of integration. EVEs were not detected in dinoflagellate-free (aposymbiotic) cnidarian genome assemblies, including stony corals, hydrocorals, jellyfish, or seawater. The pervasive nature of dinoRNAV EVEs within dinoflagellate genomes (especially Symbiodinium), as well as their inconsistent within-genome distribution and fragmented nature, suggest ancestral or recurrent integration of this virus with variable conservation. Broadly, these findings illustrate how +ssRNA viruses may obscure their genomes as members of nested symbioses, with implications for host evolution, exaptation, and immunity in the context of reef health and disease.

ndogenous viral elements, or "EVEs," arise when whole or fragmented viral genomes are incorporated into host cell germlines. Once integrated, EVEs may propagate across successive host generations, potentially becoming fixed in a population through natural selection or drift 1,2 . Therefore, the presence and content of EVEs can provide clues into the evolutionary relationships among host species and shed light on ancient and modern virus-host interactions 3 . To date, most EVEs described in metazoan and plant genomes are retroviral, as this viral group must integrate their genome (as a provirus) into the genome of the host to replicate. Retroviruses thus possess and encode all of the molecular machinery (e.g. reverse transcriptases, integrases) required to integrate autonomously 4 . Remarkably, however, sequences from viruses that do not encode reverse transcriptases or exploit integration as a component of an obligate replication strategy-even viruses with no DNA stage-have also recently been detected as EVEs in diverse eukaryotic genomes [5][6][7][8][9][10][11] . These non-retroviral RNA EVEs have been reported in hosts ranging from unicellular algae to chiropteran (bat) genomes [12][13][14][15][16][17][18] . Though the mechanisms behind non-retroviral integration continue to be explored, viral sequences may be introduced via nonhomologous recombination and repair, through interactions with host-provisioned integrases and reverse transcriptases supplied on mobile elements (e.g. retroelements), or by utilizing co-infecting viruses 6,7 .
Endogenization of any viral sequence (including non-retroviral EVEs) may have positive, neutral or negative effects on a host [19][20][21] . While many EVEs are functionally defective or deleterious and ultimately removed from a population via purifying selection, retained EVEs may remodel the genomic architecture of their hosts or introduce sources of genetic innovation later coopted for host function (i.e. exaptation 22,23 ). Such 'domesticated' EVEs can be co-opted by hosts and utilized as regulatory elements, transcription factors, or functional proteins with purposes ranging from organism development to synaptic plasticity in the mammalian brain [24][25][26][27][28] . In particular, non-retroviral EVEs potentially serve as antiviral prototypes that help hosts combat infection by exogenous viruses currently circulating in the population 14,[29][30][31] . Mechanisms underpinning EVE-derived immunity can include cell receptor interference, nucleic acid sequence recognition (e.g., RNAi), or even replication sabotage through production of faulty virus proteins from EVEs 32 . If expressed, EVEs may have a significant influence on the health, physiology and/or behavior of their hosts in natural and experimental systems 31,33,34 .
Investigating the distribution, sequence identity, and function of EVEs can yield insight into virus-host interactions across generations. EVEs catalogue a subset of the viruses that a host lineage has encountered and can link homologous extant viruses to contemporary hosts or known disease states 31 . Because integrated elements may accrue mutations at a slower rate than exogenous viral genomes 6,35 , EVEs can fill gaps in virus-host networks and act as synapomorphies, indicating the minimum time that a virus may have interacted with a host. As 'genomic fossils', EVEs have helped paleovirologists date the minimum origin of Circoviridae 36 , Hepadnaviridae 37 , Bornaviridae 38 , Flaviviridae 39 , Lentiviridae 40,41 , and Spumaviridae 42 infections within metazoans 2,24,35,43,44 (reviewed by Barreat and Katzourakis in 2022 45 ).
Coral holobiontsthe cnidarian animal and its resident microbial assemblage, including dinoflagellates in the family Symbiodiniaceae, bacteria, archaea, fungi, and virusesare an ecologically and economically valuable, multipartite non-model system 46,47 . Symbiodiniaceae are key obligate nutritional symbionts of corals and support their hosts in the construction of reef frameworks 48 . However, environmental stress can break down coral-Symbiodiniaceae partnerships, resulting in bleachingthe mass loss of Symbiodiniaceae cells 49 . Some bleaching signs (paling of a coral colony) are hypothesized to also result from viral lysis of Symbiodiniaceae 21,50-54 , but direct evidence supporting this hypothesis remains limited. Overall, the role of viruses in coral colony health and disease requires further examination.
Non-retroviral +ssRNA dinoRNAV sequences were first reported in stony corals based on five metatranscriptomic sequences and corroborated by Symbiodiniaceae EST libraries 55 . Subsequent studies indicated that similar +ssRNA viruses are commonly detected in coral RNA viromes and metatranscriptomes, as well as via targeted amplicon assays 54,[56][57][58] . These viruses exhibit synteny and significant homology to Heterocapsa circularisquama RNA virus (HcRNAV 57 ), the sole recognized representative of the genus Dinornavirus and a known pathogen of free-living dinoflagellates 59 . Both HcRNAV and dinoRNAV sequences detected in coral holobiont tissues contain two ORFsa Major Capsid Protein (MCP) and RNA dependent RNA polymerase (RdRp). Furthermore, icosahedral virus-like particle (VLP) arrays resembling HcRNAV (but with 40% smaller individual particle diameters) have been imaged in the Symbiodiniaceae-dense coral gastrodermis tissue and in Symbiodiniaceae themselves 60 . Levin et al. (2017) 57 assembled the 5.2 kb genome of a putative dinoRNAV from a poly(A)-selected metatranscriptome generated from cultured Symbiodinium. The assembly contained a 5' dinoflagellate spliced leader ("dinoSL" 61 ) a component of >95% of Symbiodiniaceae mRNAs, speculated to illustrate molecular mimicryand exhibited >1000-fold higher expression in a thermosensitive Cladocopium C1 population relative to a thermotolerant population of this Symbiodiniaceae strain at ambient temperatures (27°C 48,57 ). Together, the findings from these studies suggest that Symbiodiniaceae are target hosts of reef-associated dinoRNAVs.
This study (1) systematically searched for putative endogenized dinoRNAVs in metagenomes from in situ (symbiotic) coral colonies and seawater, as well as in available genomes of Symbiodiniaceae and aposymbiotic (symbiont-free) cnidarians, (2) investigated the evolutionary relationship of putative dinoRNAV EVEs to exogenous reef-associated dinoRNAV sequences, and (3) made preliminary inferences regarding the distribution and possible function of these dinoRNAV EVEs based on their detection, prevalence, and genomic context.

Results and discussion
Evidence of endogenized dinoRNAVs in coral holobiont metagenomes. Putative dinoRNAV EVEs were detected in metagenomes generated from 42 cnidarian holobionts out of 269 sampled across the South Pacific Ocean (Supplementary Data 1). The majority of endogenized dinoRNAVs were identified in hydrocoral metagenomes (Millepora spp.; 70.5%, n = 105) which predominantly harbored Symbiodinium dinoflagellates but EVE-like sequences were also observed in scleractinian coral metagenomes (Pocillopora spp.; 29.5%, n = 15.) which predominantly harbored Cladocopium and Durusdinium dinoflagellates (Fig. 1a, c). No dinoRNAV-like sequences were detected among Porites spp. metagenomes (Figs. 1, 2). Hydrocoral metagenomes were sequenced at equivalent depths as scleractinian corals and had comparable levels of annotation ( Supplementary Fig. 1, Supplementary Data 2); thus, higher dinoRNAV EVE prevalence in hydrocoral libraries was likely not a result of methodological bias. Of the 11 evaluated South Pacific islands, dinoRNAV EVEs were identified in samples from eight (Guam, Gambier, Moorea, Cook, Niue, Malpelo, Coïba, and Las Perlas), spanning 18 unique sites (Fig. 1b, d). Among Pocillopora spp. metagenomes, putative dinoRNAV EVEs were only  identified on the Central American coast (CAMR, Coastal Pacific Longhurst Province) and were absent in Melanesia, Micronesia, and Polynesia; at these latter sites, dinoRNAVs were largely found in Millepora hydrocoral metagenomes. Importantly, endogenized dinoRNAV open reading frames (ORFs) appeared to be immediately adjacent to ORFs identified as dinoflagellate (typically Symbiodiniaceae) genes-they were not proximal to coral genes or those of other cellular organisms abundant in these metagenomes (Supplementary Data 3).
We examined the Symbiodiniaceae ITS2 profiles associated with each metagenome and found that putative dinoRNAV EVEs were primarily associated with Symbiodinium, Cladocopium, and Durusdinium, which exhibited variation on both host and regional scales (Fig. 1c, Supplementary Data 4). DinoRNAV EVEs were more common in Symbiodinium-dominated cnidarians (F 2,1044 = 25.8, p < 0.0001, nested ANOVA; Supplementary  Fig. 2, Supplementary Data 5) relative to cnidarians hosting other Symbiodiniaceae genera, regardless of host. This suggested that dinoRNAV integration may be particularly recurrent or conserved within the genus Symbiodinium (Fig. 1).
To determine if these putative viral integrations were specific to cnidarian holobiont metagenomes and ensure that they were not artifacts of shared sample processing and sequencing procedures of the Tara Pacific pipeline, we also analyzed seawater metagenomes and publicly available metagenomes from the stony coral-dinoflagellate holobiont, Acropora spp. (Supplementary Data 1B, [62][63][64][65][66][67][68][69]. Examination of 120 Tara Oceans pelagic seawater metagenomes 70 yielded no sequences sharing homology to dinoRNAVs. The concentration of Symbiodiniaceae cells within cnidarian tissues is considerably higher than that of the surrounding seawater [71][72][73][74] . On average, only 1.46 ± 0.08% of assembled contigs in seawater metagenomes were annotated as Symbiodiniaceae. Thus, lack of detection of dinoRNAV-like sequences from seawater metagenomes is likely due to reduced genomic signal of Symbiodiniaceae in the water column, rather than a lack of EVEs associated with Symbiodiniaceae lineages in seawater. However, it also must be noted that these Tara Oceans seawater metagenomes were not collected concurrently with coral samples 75 . Analysis of the 30 non-Tara Acropora holobiont metagenomes identified 29 more putative dinoRNAV EVEs ( Fig. 2; Supplementary Data 6). These dinoRNAV EVEs were again neighboring dinoflagellate ORFs. While the Caribbean Acropora metagenomes analyzed contained too few reads to resolve the dominant Symbiodiniaceae present, earlier studies of the same coral colonies identified Symbiodinium spp. as the primary symbiont present 76 .
The identification of endogenized dinoRNAV-like sequences in cnidarian holobiont metagenomes, combined with the proximity of dinoRNAV-like ORFs to dinoflagellate-like sequences across metagenomes harboring diverse dinoflagellate consortia, collectively indicate that dinoRNAV EVEs are widespread among Symbiodiniaceae genera ( Fig. 2

cyan dots).
Endogenized DinoRNAVs detected in Symbiodiniaceae genomes. To further test the hypothesis that dinoRNAVs on reefs infect dinoflagellate symbionts and not cnidarians, we examined 18 scaffold-scale genome assemblies representing the dinoflagellate families Symbiodiniaceae and Suessiaceae as well as 25 cnidarian genomes spanning 10 genera (Supplementary Data 1B 62-69 ; Fig. 2; Table 1). Alignments revealed no evidence of endogenized dinoRNAVs in any of the 151,782 aposymbiotic (dinoflagellate-free) cnidarian scaffolds. In contrast, the same approach uncovered 351 (of 593,433) dinoflagellate scaffolds with evidence of endogenized dinoRNAVs ( Fig. 2; Table 1). The identified 351 dinoRNAV EVE-containing scaffolds were observed across 17 of the 18 dinoflagellate genome assemblies (Table 1). DinoRNAV EVEs were also observed in two assemblies from the free-living dinoflagellate genus, Polarella (family Suessiaceae), which is closely related to the family Symbiodiniaceae, and served as an outgroup in this study 77,78 . Interestingly, assemblies belonging to Symbiodinium, the most ancestral Symbiodiniaceae genus 48 , contained a higher number of scaffolds with putative dinoRNAV EVEs (x=28.11, stdev=10.7) relative to assemblies of other Symbiodiniaceae genera (x=8.71, stdev=11; Fig. 2 cyan dots; Table 1). This result may clarify why observations of dinoRNAV-like ORFs were more common in metagenomes dominated by Symbiodinium (Fig. 1c). The dinoflagellate genome assembly with no detected dinoRNAV EVEs belonged to a relatively incomplete assembly of Cladocopium C15, which had the second lowest N50 and lowest BUSCO completeness score of all genomes examined (completeness 11.6%, relative to the average 24.54%; Table 1, Supplementary Data 7). The lower coverage/completeness of the Cladocopium C15 assembly indicates a reduced window into this genome. It is therefore possible that when a more complete assembly is generated, dinoRNAV EVE-like sequences will be detectable from this dinoflagellate. However, a linear model suggested that there was no relationship between dinoRNAV EVE detection and assembly statistics (i.e. query length, N50, or completeness; see Supplementary Data 8 for linear model output). Instead, dinoflagellate genus was the strongest predictor of dinoRNAV detection in a genome (LM results: Genus F = 5.74, p = 0.012) and dinoRNAV detections were significantly higher in Symbiodinium than Cladocopium genomes (pairwise estimated difference = −27.77 ± 5.91, p = 0.01; Supplementary Data 9). Furthermore, since we were unable to detect dinoRNAV EVEs in Porites metagenomes-a coral species primarily harboring Cladocopium C15 symbiontswe hypothesize that dinoRNAV endogenization was either less common in this lineage of Symbiodiniaceae or integrations have been lost over evolutionary time 79,80 .
Incomplete ORFs and possible duplications indicate endogenization of DinoRNAVs. The repeated observation of putative dinoRNAV EVEs in dinoflagellate scaffolds and contigs from metagenomes and genomes suggests these sequences are either (1) conserved sequence artifacts of Symbiodiniaceae-dinoRNAV interactions, and/or (2) evidence of highly prevalent dinoflagellate viruses, commonly integrated and propagated via their singlecelled hosts. If the observed dinoRNAV-like sequences represent active infections capable of generating virions during egress, we would, at minimum, expect essential ORFs associated with replication (RNA-dependent RNA polymerase, RdRp) and virion structure (Major Capsid Protein, MCP) to be endogenized on the same scaffold. We would additionally expect to observe overall conservation of ORF length/composition (with a lack of internal stop codons or substantial deletions) when aligning the dinoRNAV-like sequences detected here with known exogenous dinoRNAV sequences.
However, both DIAMOND and gene prediction analyses generally depicted dinoRNAV-like ORFs in isolation on separate scaffolds. While 28 MCP and 73 RdRp dinoRNAV ORFs were annotated, both ORFs were present on a Symbiodiniaceae scaffold potentially representing whole dinoRNAV genome integrationsin only 14 instances. Thirteen of these 14 were from Symbiodinium genomes, whereas one scaffold was from Breviolum minutum, a member of the second most ancestral dinoflagellate genus (Table 1) 48 . To assess the conservation of putative dinoRNAV EVE sequence length/composition, we aligned the genomic and single ORF EVEs to reference exogenous dinoRNAV sequences. The reference genome for reef-associated dinoRNAVs is~5 Kbp long and contains a 1,071 bp noncoding region between ORFs, with a 124-nucleotide internal ribosomal binding site 57 . In this study, for 13 of the scaffolds in which dinoRNAV ORFs were detected, the putative noncoding region between the MCP and RdRp EVEs ranged from~200-800 bp (except for a scaffold belonging to S. linucheae CCMP2456, which contained a~79 kbp noncoding region, and was excluded in further alignments). No internal ribosomal binding sites were detected within the putative dinoRNAV EVEs identified in dinoflagellate genomes. A nucleotide-based alignment to Levin et al.'s (2017) 57 reference dinoRNAV genome indicated that the putative dinoRNAV EVEs presented here contained substantial insertions and/or deletions ( Supplementary Fig. 3). Translated exogenous dinoRNAV MCP ORFs are reported to be~358 aa in length 57 ; Fig. 3 top sequences), but dinoRNAV-like MCP sequences recovered in this study ranged from 116-605aa in length. Furthermore, comparisons of these endogenous MCPs to exogenous reference sequences revealed internal stop codons and overall low similarity (Fig. 3). Amino acid-based alignment of endogenous dinoRNAV MCPs to metatranscriptome-and amplicon-generated exogenous reference sequences 57,58 revealed indels and regions of low similarity between three conserved regions across both endogenous and exogenous MCP sequences (red boxes in Fig. 3). Interestingly, multiple whole dinoRNAV integrations were sometimes observed in a single dinoflagellate genome. For example, genome assemblies of four different S. microadriacticum strains contained two or three whole dinoRNAV EVEs each (Table 1; Fig. 2). Pairwise alignments measuring shared nucleotide identity of whole dinoRNAV EVEs across Symbiodiniaceae scaffolds revealed that the S. microadriaticum genomes and the S. necroappetens genome share two whole genome dinoRNAV EVEs (provisionally dinoRNAV-A and dinoRNAV-B; Supplementary Fig. 3; Clustal-Omega) 81 . S. microadriaticum dinoRNAV-B was identical in all strains and shared 97% identity with the S. necroappetens dinoRNAV-B, yet proximal genes varied (Supplementary Data 10, 11). Importantly, the inconsistent composition and fragmented nature of both the genomic and single ORF dinoRNAV EVEs reported here supports the hypothesis that these sequences are not capable of generating replicative virions and are best interpreted as multiple integrations of dinoRNAVs into a host genome.
A Potential Mechanism for dinoRNAV Endogenization: Host-Provisioned Retroelements. To assess if general genomic "neighborhoods" are conserved across dinoRNAV integrations (e.g. site location and synteny) and to better understand the genes proximal to EVEs on Symbiodiniaceae genomes, a chromosomescale Symbiodinium microadriaticum genome assembly was evaluated (Fig. 4). The highest quality dinoflagellate genome assembly currently available revealed dinoRNAV-like ORFs on 18 of 94 chromosomes, with at least one RdRp on each, and some with multiple (two with n = 2 RdRps, three with n = 3 RdRps). On three of the chromosomes (# 30, 35, and 74), there were predicted ORFs annotated as dinoRNAV MCPs in close proximity to a RdRp ORF (separated by noncoding regions 319-656nt), indicative of a potential full-length dinoRNAV genome integration. These results corroborate detections of multiple genomic dinoRNAV EVEs in scaffold-scale assemblies of Symbiodinium microadriaticum genomes (Supplementary Fig. 3). The higherresolution S. microadriaticum chromosome-level assembly facilitated the identification of an additional dinoRNAV genomic EVE (n = 4 for chromosome-level vs. n = 3 for scaffold-level, Supplementary Fig. 3), two of which were identified on Chromosome 74 and were separated by 2501 nucleotides. Of note,   82 reported a decreasing abundance and expression of genes towards the center of chromosomes (past 2Mpb of a telomere), where there was an increase in repetitive elements; this is where 26 of 29 putative dinoRNAV EVEs were identified in the chromosome-level assembly. Furthermore, ORFs neighboring integrations often varied widely, both in proximity and predicted function, from collagen and RNA binding protein to reverse transcriptase and non-LTR retrotransposable elements. These ORFs potentially contributed to the endogenization of dinoRNAV via mechanisms such as retrotransposition (Fig. 4,  Supplementary Data 10).
Retroposition through host-provisioned retroelements is one proposed mechanism of non-retroviral RNA virus integration into eukaryotic genomes 6,7 . An indicator of this form of integration is the nearby presence of a relict dinoflagellate spliced leader ("dinoSL"), a 22nt sequence located at the 5' end of mRNAs [83][84][85][86] . Such a sequence flanks the RdRp gene on some extant dinoRNAVs 57 . We detected dinoSLs within 500 bp of 23.1% (six of 26) endogenized RdRp ORFs on S. micoradriaticum chromosomes, providing support for retroposition of these viral elements into Symbiodiniaceae genomes (Supplementary Data 11,12). DinoRNAV gene integration may be facilitated by any of three major orders of retroelements associated with Symbiodiniaceae, including long terminal repeat (LTR) retrotransposons, short interspersed nuclear elements (SINEs), and long interspersed nuclear elements (LINEs 83,87,88 ). Evidence suggests that these LINEs are common and non-active remnants of an ancient proliferation of LINEs that preceded the diversification of Suessiales 78,83,89 . Symbiodinium contains more LINEs relative to other Symbiodiniaceae genera, comprising 74.10-171.31 Mbp of Symbiodinium genomes, relative to an average of 7.48 Mbp of the genomes of in other genera, indicating the loss of these retroelements across speciation events 82,83 . The loss of LINEs in more recently derived Symbiodiniaceae genera coincides with a decrease in dinoRNAV EVE detection in these genomes (Table 1). Conversely, the genomes of Polarella, the psychrophilic and freeliving outgroup from which Symbiodiniaceae diversified~160 million years ago, are LINE-rich and generally have comparable numbers of dinoRNAV EVEs to Symbiodinium (Table 1 48,77,78,83 ). Together, this suggests that LINE activity during speciation may have facilitated dinoRNAV integration and the resulting EVEs may constitute dinornavirus "fossils." This may explain their degree of sequence fragmentation and relatively low sequence similarity to modern extant dinoRNAVs (Fig. 3).
LINE-mediated retroposition is further supported by the observation of a LINE reverse transcriptase homolog~17 kbp upstream of a RdRp EVE with a relict dinoSL on chromosome 45 (Supplementary Data 12) and a LINE retroelement 95 bp downstream of an EVE recovered from a Pocillopora metagenome (Fig. 4). Additionally,~40% of annotated ORFs (35 of 88 annotated proteins) proximal to dinoRNAV ORFs on S. microadriaticum chromosomes were similar to non-LTR elements seen in other eukaryotic genomes sometimes <300 bp 5' upstream (Supplementary Data 12, Supplementary Fig. 4). Collectively, these findings implicate host provisioned retroelements, such as LINEs, as facilitators of dinoRNAV gene integration.

DinoRNAV EVEs show homology to extant exogenous viruses.
Modern, exogenous dinoRNAVs (Order: Sobelivirales) are highly divergent and hypothesized to form chronic infections within dinoflagellate hosts 54,55,57,58 . This chronic infection strategy likely provides opportunities for retroelement-driven endogenization into host genomes. Because many EVEs evolve at the rate of the host genome, rather than at the much faster rate of exogenous +ssRNA viral genomes, EVEs can serve as a snapshot of viral ancestry 90 . We compared translated dinoRNAV EVEs to exogenous dinoRNAVs and other Dinornavirus taxa to assess the conservation of EVEs, the potential for host utilization of these elements, and their relatedness to contemporary dinoRNAVs. We found that amino acid translations of endogenous dinoRNAV MCP sequences contained conserved motifs observed in the exogenous MCP sequences (e.g. Regions 1-3 in Fig. 3), yet the associated phylogeny was highly polyphyletic along inferred ancestral nodes (Fig. 5a). Endogenous MCP ORFs also appear to be evolving under neutral selection (dN/dS=0.958).
Endogenized dinoRNAV MCP form their own clades within the MCP tree, each closely related to specific clades consisting of extant dinoRNAVs or environmental (i.e. unclassified) sobeliviruses with similar conserved motifs. The majority of dinoRNAV MCP EVEs shared similarity to extant MCPs identified from unfractionated stony coral holobionts via amplicon sequencing 58 ; these sequences formed an independent, disorganized clade (Fig. 5a clade containing yellow and blue sequences), relative to those recovered from dinoflagellate transcriptomes or those of other invertebrate hosts. Likewise, dinoRNAV RdRp EVEs identified via metagenomics appear most similar to HcRNAV, the defining member of family Alvernaviridae and a protist pathogen, further supporting the affiliation of this EVE with a dinoflagellate host. MCP and RdRp ORFs putatively derived from the same dinoflagellate genomes often shared clades (clades containing multiple blue or green sequences in Fig. 5a, b), perhaps indicative of duplications within genomes or multiple integration events of particular dinoRNAV lineages within host genera. The detection of putative dinoRNAV RdRp ORFs within Polarella genomes is therefore indicative of either the antiquity of dinoRNAV-dinoflagellate interactions and/or a propensity for recent dinoRNAV integration across Dinophyceae families. However, the exclusion of the P. glacialis dinoRNAV-RdRp from RdRps of other dinoflagellate clades (pink, Fig. 5B) further illustrates the congruence between EVEs and their host genomes. Overall, the evident homology to contemporary Dinornaviruses support these integrations as Alvernaviridae within order Sobelivirales.
The expression and functional potential of endogenized dinoR-NAV elements (if any) remains unclear. With no isolated Symbiodiniaceae-infecting dinoRNAV strains available, investigation into EVE functionality is limited to in silico approaches. Sequence data mining efforts identified RNA sequences either sharing sequence similarity with dinoRNAVs, or containing whole dinoRNAV-like ORFs that also annotated as dinoflagellate transcripts (i.e. with cellular ORFs or sequence similarity) in seven out of nine publicly accessible dinoflagellate transcriptomes (Supplementary Data 13). Additionally, two transcripts from an exogenous dinoRNAV infection identified in Cladocopium transcriptomes carried MCP ORFs of +ssRNA viral sequences ('TR74740_c13-g1_i1' and 'TR74740_c13-g1_i2' 57 , red text in Fig. 5a) and form a clade with putative Symbiodinium dinoRNAV EVEs (Fig. 5). Likewise, the RdRp ORF of 'TR74740_c13-g1_i1' and the RdRp of 'GAKY01194223.1'a transcript derived from a cultured Symbiodinium microadriaticum A1 transcriptome-shared some areas of similarity to putative endogenous dinoRNAVs (Fig. 5b 57,91 . Importantly, both RNA transcripts also shared features characteristic of dinoflagellates, such as a 5' dinoSL 61 or dinoflagellate sequence space flanking the dinoRNAV itself 91 . Furthermore, 'TR74740_c13-g1_i1' appeared to be in the top 0.03% of expressed transcripts at under certain thermal conditions, and GAKY01194223.1 appeared to exhibit moderately differential expression at the extremes of temperature and ionic stress in a cultured host 57,91 . While viral RdRps have been leveraged by eukaryotes in multiple pathways 92 , the apparent fragmentation of the putative dinoRNAV EVEs in silico may indicate a role in triggering antiviral mechanisms within their hosts 31,93 . Given that the Symbiodinium genome contains all core RNAi protein machinery, including Argonaute and Dicer, and that GAKY01194223.1 folds into several hairpins (ΔG = −142.5 kcal/mol; Supplementary Fig. 5 examples), Symbiodiniaceae may use the putative EVE ncRNA identified here to develop host immunity against extant, exogenous dinoRNAVs. Furthermore, Symbiodiniaceae harboring dinoRNAV EVEs also contained numerous non-retroviral EVEs of other viral families (Supplementary Data 11, Fig. 7) in close proximity, such as Herpesviridae, Baculoviridae, Poxviridae, Iridoviridae, Phycodnaviridae, Pandoraviridae and Pithoviridae, ssDNA viruses of the family Shotokuvirae, -ssRNA viruses from the family Rhabdoviridae and +ssRNA viruses from the family Coronaviridae ( Supplementary Fig. 6). Metagenomes corroborate findings of similar RdRps from these viral families (Supplementary Fig. 6). This provides support for host-mediated integration (e.g. retroposition) as a means of defense for single celled organisms, though further research is needed 94 .
Conclusions. Over recent decades, endogenous viral elements (EVEs) have enabled investigators to better understand the evolutionary history of viruses ("paleovirology") in diverse terrestrial systems, uncovering ancient and modern virus-host interactions. Our study further demonstrates how in silico identification of EVEs can provide ecological context for enigmatic viral genomes in non-model, multipartite systems such as coral holobionts, impacting how we study coral reefs and their viral consortia. Here, we detected heritable integrations of multiple putative dinoRNAV genes in Symbiodiniaceae scaffolds from cnidarian metagenomes, as well as in diverse genomes of cultured Symbiodiniaceae; no integrations were detected from seawater metagenomes nor diverse aposymbiotic cnidarian genomes. The apparent pervasive nature of dinoRNAV-like sequences among dinoflagellate genomes (especially the genus Symbiodinium) suggests widespread and recurrent/ancestral integration of these EVEs. We propose that host-provisioned mechanisms drive dinoRNAV integration into single-celled dinoflagellate genomes as EVEs. The findings presented in this study further validate the dinoRNAV-Symbiodiniaceae virus-host pair, enhancing our understanding of ecologically and economically important cnidarian holobionts and opening the door to examining the role of EVEs in reef health. investigate reef health and ecology using multiple methods, including amplicon sequencing and metagenomics (see Pesant et al. 2020 95 and https://doi.org/10. 5281/zenodo.4068293 for coral reef sampling and processing methods). In this study, we explored metagenomes generated from hydrocorals (n = 60 Millepora), stony corals (n = 108 Porites, n = 101 Pocillopora) sampled from 11 islands (three replicate sites per island) across the South Pacific Ocean during the Tara Pacific Expedition for dinoRNAV EVEs (Fig. 1, Supplementary Data 1A, 1B 95 ). Amplicon libraries of the dinoflagellate Internal Transcribed Spacer 2 (ITS2) gene fragment were sequenced in tandem with the metagenomes, to characterize the dominant Symbiodiniaceae harbored by hydrozoan and stony coral colonies 95 .

Methods
To confirm that these dinoRNAV EVE sequences were affiliated with coral holobionts and reduce the possibility that they are technical artifacts, publicly available metagenome libraries were analyzed (Supplementary Data 1B). These additional libraries included 120 assembled pelagic water samples presumed to include pelagic dinoflagellate sequences from the Tara Oceans dataset (2009-2013 70 ) and 30 MiSeq metagenomes from unfractionated samples of the stony coral genus Acropora, which were processed and sequenced via a different pipeline (Supplementary Data 1B, Supplementary Fig. 7). Publicly accessible transcriptomes from nine Symbiodiniaceae assemblies (Supplementary Data 1B) were also queried to determine if dinoRNAV-like sequences were present in poly(A)-selected dinoflagellate transcriptomes and resembled EVEs in terms of proximal gene composition and presence of a characteristic pre-mRNA spliced leader (dinoSL) sequence (as in Levin et al, 2017 57 ). Details regarding the collection of samples, generation of metagenomes and associated Symbiodiniaceae amplicon libraries, and associated bioinformatic analyses are provided in Supplementary Fig. 7).
Metagenomic and transcriptomic scaffolds were annotated against a curated database of dinoRNAV-like sequences (Supplementary Data 14) via BLASTx (evalue < 1 × 10 −5 ; see Supplementary Fig. 7 for workflow 96 ). Alignments to the custom database with a bit score <50 and percent shared amino acid identity <30% were excluded from further analysis. A length penalty was not imposed during this step due to the limited length of assembled scaffolds (average N50 = 3341 ± 127 nt across all queried libraries). Open reading frames (ORFs) from selected scaffolds were called via Prodigal (v.2.6.3 97 ) and annotated against the NCBI-nr database (evalue < 0.001; DIAMOND v.2.0.6 98 ) to confirm homology to dinoRNAVs and to identify adjacent dinoflagellate sequences (e-value < 1 × 10 -5 , bit≥50). In the absence of complete ORFs (potentially due to the limited size of scaffolds, partial integrations, etc.), homology was confirmed through comparison of the initial alignments to the curated database and 300nt of upstream/downstream flanking sequences (bedtools v.2.30.0 99 ) against the NCBI-nr database (e-value < 0.001; DIAMOND v.2.0.6 98 ). This served as further curation and verification, as EVEs can exist in fragmented or degraded states. Non-normalized quality-controlled reads were mapped via bbmap (v.38.84 100 ), and putative EVEs were assessed for uniform read coverage across scaffolds, reducing the probability of chimeric assembly. RNA secondary structure was predicted via mfold (v.3.5 101 ).
We tested the relationship between the number of identified dinoRNAV EVEcontaining scaffolds, dinoflagellate genera, and genome quality metrics using a linear model. Model selection was performed with an F-test (package car, v.3.0-12) and assumptions were visually checked. Pairwise comparisons between genera were conducted using the package emmeans (v.1.7.2). Putative whole dinoRNAV-like genomes within scaffolds were identified based on the presence of MCP and RdRplike sequences on the same scaffold no further than 1.5 Kbp apart (Table 1; Supplementary Fig. 3). IRESPred 105 was utilized to identify internal ribosomal entry sites (IRES) with default parameters on putative dinoRNAV EVE with whole sequence integrations.
ORFs were predicted and annotated from dinoRNAV EVE-containing scaffolds and all dinoflagellate chromosomes using Prodigal 97 and MAKER2 annotation pipeline 106 with the AUGUSTUS gene prediction software 107 . Translated ORFs were then aligned to a hybrid database containing the UniProt/ Swiss-Prot database and protein version of RVDB (v.19; DIAMOND-BLASTp). ORFs on putative dinoRNAV EVE-containing scaffolds and chromosomes were further annotated using InterProScan (v5.48-83.0, Pfam analysis with default parameters) to identify sequences proximal to putative dinoRNAV integrations. The presence of dinoflagellate spliced leaders ("dinoSLs") were examined within 500nt of dinoRNAV EVEs using BLASTn with default parameters (except word size=9, excluding two ambiguous positions as specified in Gonzalez-Pech et al. 2021 83 ).
Phylogenetic analysis of dinoRNAV EVEs. Amino acid-based phylogenetic trees were generated with dinoRNAV EVE ORFs (MCP and RdRp) from scaffold-scale genomic assemblies, metagenomes, transcriptomes, and sequences from exogenous and closely related +ssRNA reference viruses (Supplementary Data 1 A, B, Supplementary Data 14). Sequences were aligned using the best fit algorithm determined by MAFFT (v7.464) 108 and reviewed and trimmed manually in MEGA (v7) 109 . Maximum-likelihood trees were generated with IQTREE2 110 using the model determined by ModelFinder 111 and 50,000 parametric bootstraps 112 with nearest neighbor interchange optimization. ORFs from the chromosome-level assembly for S. microadriaticum culture CCMP2467 were not included in the phylogeny in order to avoid redundancy with those from the analogous scaffoldlevel assembly. To calculate dN/dS, ORFs were aligned in Clustal Omega (v.1.2.4), refined in MUSCLE (v.3.6), before using pal2nal (v.14) for codon-based nucleic acid alignment. Evolutionary trajectory was then assessed via CODEML (PAML package, v.4.10.5).
Statistics and reproducibility. As indicated throughout the article, metagenomes (n = 269) and genomes (n = 18) served as technical replicates, and ORFs or full EVE sequences served as comparative ecoevolutionary units (replicates described in Table 1) when available. Negative controls (seawater metagenomes, coral host, etc) were also evaluated. All statistical packages are reported in methods or Supplementary Fig. 7.
Reporting summary. Further information on research design and collection permits are available in the Nature Research Reporting Summary linked to this article.

Data availability
Metadata are accessible in zenodo: https://zenodo.org/record/6299409#.Y-ClwuzMKml. Metagenomes are available via https://doi.org/10.5281/zenodo.7839794. Seawater metagenomes are available through the European bioinformatics institute (Tara Oceans; ERP001736) and NCBI (PRJEB1787). NCBI accession numbers for individual holobiont species metagenomes, genome assemblies and reference sequences can be found in Supplementary Data 1B, 3 and 14, respectively. Fig. 5 Phylogenies of dinoRNAV major capsid protein (MCP) and RNA-dependent RNA polymerase (RdRp) ORFs. a dinoRNAV Major Capsid Protein (MCP) ORF phylogeny. Maximum-likelihood tree of MCP amino acid sequences generated with a LG + F + G4 substitution model and 50,000 parametric bootstraps, illustrating the similarity of putative dinoRNAV EVEs (this study) to extant dinoRNAVs from stony coral colonies. b RNA-dependent RNA polymerase (RdRp) ORF phylogeny. Maximum-likelihood tree of RdRp amino acid sequences generated with a Blosum62 + G4 substitution model and 50,000 parametric bootstraps, demonstrating the similarity of metagenomic dinoRNAV EVE RdRps to RdRps of the sole recognized Dinornavirus, Heterocapsa circularisquama RNA virus (HcRNAV), as well as alignment to each other. ORFs were recovered from host metagenomes, transcriptomes, genomes, and extant +ssRNA reference viruses from amplicon libraries (a only). Both trees include Dinornavirus reference sequences and visualized in iTOL.